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ABSTRACT 


Research Data Management (RDM) has become increasingly important for more and more academic 
institutions. Using the Peking University Open Research Data Repository (PKU-ORDR) project as an example, 
this paper will review a library-based university-wide open research data repository project and related RDM 
services implementation process including project kickoff, needs assessment, partnerships establishment, 
software investigation and selection, software customization, as well as data curation services and training. 
Through the review, some issues revealed during the stages of the implementation process are also discussed 
and addressed in the paper such as awareness of research data, demands from data providers and users, data 
policies and requirements from home institution, requirements from funding agencies and publishers, the 
collaboration between administrative units and libraries, and concerns from data providers and users. The 
significance of the study is that the paper shows an example of creating an Open Data repository and RDM 
services for other Chinese academic libraries planning to implement their RDM services for their home 
institutions. The authors of the paper have also observed since the PKU-ORDR and RDM services implemented 
in 2015, the Peking University Library (PKUL) has helped numerous researchers to support the entire research 
life cycle and enhanced Open Science (OS) practices on campus, as well as impacted the national OS 
movement in China through various national events and activities hosted by the PKUL. 
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1. INTRODUCTION 


Open Science (OS) has made science more efficient, reliable, and responsive to societal challenges and 
reshaped scholarly communication landscapes [1]. Open access to research data is regarded as a driver 
for OS [2]. The scientific research community believes that data should be open, accessible, and reusable. 
Data sharing and reuse help strengthen researchers’ and institutional data stewardship [3,4,5]. To better 
foster OS, Open Data (OD), and transformation of scholarly communication, more and more academic 
libraries have provided or plan to provide library-based research data management (RDM) services for their 
home institutions. Tenopir et al.’s study [6] suggested that academic libraries be ideal centers for research 
data service activities on campuses, providing unique opportunities for academic libraries to become even 
more active participants in the knowledge creation cycle in their institutions. More recent studies agreed 
that academic libraries and academic librarians, as active stakeholders, have been playing a significant role 
in fostering open access movement, transforming scholarly communication landscapes, and facilitating 
RDM services [6,7,8,9,10,11,12]. 


Although RDM has been rapidly developed in research universities in the world, RDM in Chinese 
academic libraries is still at the early stage of the development. For example, the re3data.org (the Registry 
of Research Data Repositories) is listed more than 2,000 research data repositories in 2019, while among 
them, there were only 42 research data repositories related to China. The re3data.org an OS tool that offers 
researchers, funding agencies, libraries, and publishers an overview of existing international repositories 
for research data [13]. 


Peking University Library (PKUL), as a top research university library in China, has long been actively 
seeking opportunities to raise awareness, foster collaboration, and initiate projects from both inside and 
outside of the University. The PKUL also has been actively involved in numerous national and international 
endeavors to foster open access movement and transform scholarly communication for the past decade. 
Since 2010, the PKUL has implemented the following initiatives to support the dynamic changing 
environment of the scholarly communication in the University: Peking University (PKU) Institutional 
Repository in 2010, PKU Open Journals in 2015, Scholars @ PKU in 2015, and PKU Open Research Data 
Repository (PKU-ORDR) in 2015. Particularly, the PKU-ORDR was created in 2015 for facilitating more 
effective and efficient data preservation, data sharing and reuse, and providing incentives for making data 
readily accessible to researchers and for the general public. 


This paper will review how the PKUL implemented the PKU-ORDR project and RDM services to foster 
and support OS and OD on campus and impact Chinese OS communities. The implementation phases 
being reviewed include project kickoff and needs assessment, partnerships establishment, software 
investigation and selection, software localization, and customization, as well as the implementation of RDM 
policies and services. Some issues revealed during the stages of the implementation process will be also 
discussed and addressed in the paper such as awareness of research data, demand from data providers and 
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users, data policies and requirements from home institution, funding agencies, and publishers, the 
collaboration between administrative units and libraries, and concerns from data providers and users. 


The significance of the study is that PKU-ORDR shows a successful example of creating an OD repository 
and RDM services for other Chinese academic libraries planning to implement their RDM services for their 
home institutions in the future. The authors of the paper have also observed since the PKU-ORDR and RDM 
services implemented in 2015, the PKUL has helped numerous researchers to support the entire research 
life cycle and enhanced OS practices on campus, as well as impacted the national OS movement in China 
through hosting various national events and activities. 


2. RELATED WORKS 


The related works will focus on some aspects that this paper will address such as RDM and academic 
libraries & librarians, RDM and open research data repositories and systems, collaborations between 
research units and libraries, service support and promotions, repository implementation, data curation, and 
research support staff’s or librarians’ skills training. 


Tenopir et al. [6] pointed out that science becomes more collaborative, data-intensive, and computational, 
and academic researchers face a series of data management needs. Meanwhile, Moon’s study [14] shows 
that research funding agencies require researchers to provide DMPs when they apply for a grant and 
publishers also require researchers to provide data when publishing research results. Curdt’s study [15] 
indicated that science conducted in cross-institutional, interdisciplinary, and long-term research projects 
requires active sharing of data, documents, and further information. Thus, RDM services should be 
established to support all researchers during their entire individual research studies. 


Tenopir et al. [6] also claimed that academic libraries may be ideal centers for RDM service activities 
on campuses. Cox et al. [10] reported an international study of RDM activities, services, and capabilities 
in higher education libraries. Their study found that libraries have provided leadership in RDM, particularly 
in advocacy and policy development. However, services provided by libraries are still limited, focused 
especially on advisory and consultancy services. Tripathi et al. [16] studied the RDM services implemented 
by different university libraries in India for managing, organizing, curating, and preserving research data 
generated at their universities’ departments and laboratories for data reuse and sharing and suggested a 
model for the university libraries to follow for actually deploying RDM services. 


Johnston et al. [17] compared six institutions’ RDM support levels within the Data Curation Network 
project and developed a shared staffing model for data curation across multiple institutions to support their 
researchers to meet their data-sharing goals through library-based data repository and curation services. 
Lee et al. [18] interviewed some American university institutional repositories (IRs) staff and then provided 
a rich, qualitative description of research data curation and use practices in IR. In particular, Lee et al. 
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identified data curation and use activities in IRs, as well as IRs structures, roles played, skills needed, 
contradictions and problems exposed, solutions sought, and workarounds applied. 


Curdt and Hoffmeister [19] shared their design and implementation of RDM services for a multidisciplinary 
and collaborative research project. McKinney et al. [20] described that Harvard University established a 
diffraction data publication system, the Structural Biology Data Grid (SBDG®), to preserve primary 
experimental data sets supporting scientific publications. All data sets published through the SBDG are 
freely available to the research community under a public domain dedication license, with metadata 
compliant with the DataCite Schema®. They also shared their practices that the SBDG collaborated with 
the Institute for Quantitative Social Science at Harvard University to extend the Dataverse® open-source 
data repository system to structural biology data sets. 


Mannheimer et al. [21] described how data repositories and academic libraries can partner with 
researchers to deal with challenges associated with qualitative data sharing and suggested that data 
repositories and academic libraries could help researchers address some of the challenges associated with 
ethical and lawful qualitative data sharing. Dovidonyté’s study [22] described the Lithuanian landscape of 
OS policies and institutional involvement in OS practices. The author also discussed prerequisites for 
sustainable and consistent OS implementation such as OS infrastructure, incentives for researchers, research 
assessment, and repositories’ compliance with the European Council requirements on a national level. 


Pontika [23] made an analysis and found that academic libraries have created some new academic 
librarians’ positions to support OS, OD, scholarly communication, and RDM on their campuses. However, 
researchers are still unfamiliar with RDM best practices, and research support staff including librarians 
is faced with the difficulty of providing support to researchers across different disciplines and career 
stages [24]. 


Alonso-Arévalo [25] agreed that the management of research data is one of the major challenges facing 
scientific and research libraries in the coming years. Already half of the American universities have a work 
plan on this issue, and all trend reports agree that RDM will be one of the priorities and future issues to 
be taken up by research libraries. Söderholm et al. [26] found that the network-based collaboration model 
that fosters individuals’ interconnectedness is crucial for surviving with the built-in dynamism of RDM. Tang 
and Hu emphasized in their study [27] that for growing RDM services, institutional commitment to resources 
and training opportunities is crucial. As an emergent profession, data librarians need to be nurtured, 
mentored, and further trained. 


All of these studies have provided some theoretical, useful, and practical insights and examples for us 
and also showed us some challenges and issues in the RDM implementation process faced by researchers, 
academic libraries, and librarians. 


© data.sbgrid.org 
® schema.datacite.org 
© dataverse.org 
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3. RDM IMPLEMENTATION 
3.1 Kick-Off and Needs Assessment 


As the kicking off of the PKU-ORDR project, the PKUL conducted a campus-wide survey to get a better 
understanding of RDM needs and requirements from researchers and research teams in 2013. The purpose 
of the survey was to identify the real needs of researchers and collect data from them so that the PKUL can 
create a strategic roadmap and steps to create a framework or a platform to meet the needs of RDM. The 
analysis results were summarized and published in the Journal of Library and Information Service [28]. The 
survey focused on the following aspects: awareness and current practices of RDM including data preservation, 
sharing, and reuse; description and features of research data; the current state of RDM; and expectations 
of RDM services. The survey results showed that 87.5% of respondents were willing to share research data 
under certain conditions. The biggest motivation that they were willing to share was because the participants 
recognized the value of sharing data, the positive relation between data use and citations, data visibilities, 
and credits awarded to data providers. However, the biggest concern for researchers was the issue of 
plagiarism. 


The PKUL also interviewed 23 research teams from multiple disciplines on the campus. The face to face 
communication with the teams helped discover more valuable information about the current state of RDM 
including long term preservation, data sharing, and data reuse. Zhu et al. [28] summarized three major 
findings from the interviews: (1) Research data sharing behavior is significantly influenced by disciplines. 
For example, biology is a data-driven and data-intensive discipline in which open access has already been 
a common best practice, data sharing standards and norms were already well established and put in place. 
(2) An embargo period with data sharing is generally expected and required. Almost all researchers being 
interviewed emphasized that their data should be shared after their results are formally published, which 
addressed the concern of possible plagiarism. (3) Data sharing behavior is more spontaneous and passive 
than active and lacks proper incentives and necessary maintenance, as well as a well-established mechanism 
for data citation, recognition and credits, and feedback from data users. 


The interview also revealed that the data management and needs of researchers in different disciplines 
vary greatly. Bioinformatics researchers need very large data storage so that the large amounts of process 
data generated from their experiments can be preserved. Researchers from Computer Science are willing 
to share their data; however, Computer Science data are often considered very large data and make data 
sharing cost more expensive. For example, the volume of the Chinese Web data set collected by the Institute 
of Network Computing and Information Systems in the past ten years is above 100TB. Researchers from 
Business hope they can obtain more valuable enterprise and government data that can be used in their 
classes and research. Researchers from the Institute of Social Science Survey (ISSS) hope to maximize the 
value of their survey data as much as possible through data sharing; however, the ISSS established a 
relatively strict user application procedure for users to access data. Faced with so many different data 
management needs of researchers, as an initial attempt, the PKUL analyzed the data needs based on 
priorities and decided to build an initial service infrastructure to meet the needs with the highest priorities. 
Due to a variety of process data associated with different disciplines, the PKUL decided to focus on data 
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closer to the final state and easier to share and collaborate with institutions inside and outside the campus 
to build the PKU-ORDR, making data easier for PKU researchers to access, reuse and share. 


3.2 The Establishment of the Collaborative Model 


As one of the most active advocates of RDM at the University, the PUKL made numerous efforts to 
convince various university research administrative units to invest in and provide support to create a library- 
based RDM framework. The PKUL also sought some potential partners within the University since cooperation 
and collaborations with administrative units and other units on campus are vital to the success of the project 
and critical for the sustainability of the project. 


The PKUL finally selected the Institute of Social Science Survey (ISSS) as a working partner to cooperate 
and collaborate on the development of the RDM services. The ISSS was created to act as a social science 
data survey coordinator and interdisciplinary empirical research platform that enables Peking University as 
well as other research institutions around the world to study China’s social problems and conduct social 
science research, mainly through undertaking large-scale social survey projects and sharing the survey data 
openly. So the ISSS was an ideal collaborative candidate for the Library. The ISSS also plays a leading role 
on campus to provide workshops and training classes in data access, curation, and methods of analysis for 
the social science research community. 


In 2014, the Peking University was awarded a grant by the National Natural Science Foundation of China 
for the China Survey Data Archive (CSDA) project, which aimed to develop a data repository administrated 
by the University Management Science Data Center (MSDC), a department within the ISSS. This grant 
provided an opportunity for the PKUL to build a more collaborative relationship with the ISSS. With the 
assistance of the research administrative units such as the Office of Science Research and the Office of 
Social Science Research, the PKUL and the ISSS decided to work together on this project. Initially, the 
responsibilities were split as follows: The MSDC supervised by the ISSS was responsible for research data 
collection and cleaning-up, standardization and analysis, data repository platform testing, and feedback. 
The PKUL was responsible for requirements analysis, functional design, software selection, as well as the 
development and maintenance of data repository, data storage, classification and metadata, systems 
administration, and associated technical and technological services. 


However, the ISSS and the PKUL soon discovered through the analysis of the data collected from the 
survey and interviews that it could be an opportunity to build a strong showcase for OD for the nation 
because there were only a very limited number of subject-specific and/or research team-oriented data 
storages and data services available either at the institutional level or at the national level. The initiative 
was named as PKU-ORDR (PKU Open Research Data Repository) project as a sub-project of the CSDA 
project, with its goal to develop an infrastructure to support PKU researchers to manage their data more 
effectively and efficiently and provide RDM services ranging from storages to consultations. 


The strategic objectives of the PKU-ORDR are summarized as below: 
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e To publish high-quality research data and disseminate academic outputs through an open platform; 
e To promote OS, facilitate data sharing and reuse, and encourage to reproduce research; 

e To enable and track data citations and usage metrics; 

e To explore data publishing and long-term preservation solutions; 

e To foster innovation and cross-disciplinary integration. 


In addition to the ISSS, the PKUL also cooperated with other internal units and external organizations to 
enrich the data content of the PKU-ORDR. Through collaboration with the Center for Bioinformatics of 
Peking University, the PKUL created linked data in the PKU-ORDR linking to the Bioinformatics database. 
Through cooperation with the Beijing Information Resources Management Center, the PKU-ORDR 
interoperated with the Beijing Government Data Resource System (BGDRS) so that the registered users in 
the PKU-ORDR can download data from the BGDRS directly. Through cooperation with the National 
Information Center, the PKU-ORDR collected some valuable enterprise data sets across the country. All 
these collaborations have greatly enriched the data content and expanded the disciplines’ scope of the 
PKU-ORDR. 


3.3 The Establishment of the Open Research Data Repository 


The first step of the PKU-ORDR was to create an open research data repository to meet the needs of data 
storage and data sharing. The establishment of the data repository includes selecting software as a framework 
and customizing the software. 


3.3.1 Software Selection 


There were some types of RDM software available at that time, including various institutional repositories 
(IRs) to support RDM, either open source solutions or proprietary solutions. The PKUL evaluated and 
assessed various types of existing software including Dataverse, Data Conservancy, CKAN, Dryad, ICPSR, 
Genbank, Figshare, and Nessta. The implementation team also deployed and tested some open-source 
solutions such as Dataverse, Data conservancy, CKAN, and DSpace. 


The implementation team adopted a software metrics tool and created some criteria to evaluate and 
assess these software solutions. Some general criteria were considered such as business and industry 
expertise, market knowledge, program/project management capabilities, methodology, communications, 
and independence and objectivity. Besides, as shown in Table 1, four specific criteria were particularly 
considered: @ Metadata standard and interoperability; @ Permissions management and access control; 
@) DOI identifier and version management; ®© Online analysis and visualization. It is noted that the 
Dataverse metadata schema consists of a compulsive citation metadata block and multiple optional 
discipline metadata blocks that can be easily customized. The default discipline metadata block is DDI for 
Social Sciences and the Dataverse also provides several other disciplines metadata blocks, such as 
Biomedical, Geospatial, Astronomy, and Astrophysics. Therefore, the Dataverse metadata schema is flexible 
enough and can adapt to any discipline theoretically. 


Data Intelligence 195 


Research Data Management Implementation at Peking University Library: Foster.and-Promote|| 
Open Science and Open Data 


Table 1. Software comparisons. 


Software Type Domain Four specific criteria 
Dataverse Open source software Multidisciplinary, mainly Supporting © based on DDI; supporting ©; 
Social Sciences supporting ©; supporting @ based on 
TwoRavens 
Data Open source software Multidisciplinary Beta version: supporting ©; partially 
conservancy supporting ©; not supporting © or © 
CKAN Open source software Multidisciplinary, mainly Partially supporting ©; not supporting ©, ©, 
government data and @ 
Dryad Open source software Multidisciplinary, mainly Supporting © based on expanded DC 
Bioscience standard; not supporting © or ®; partially 
supporting © 
ICPSR Proprietary software Social Sciences Supporting © based on DDI; supporting ©, 
@, and ®© 
Genbank Proprietary software Bioinformatics Supporting © and @; partially supporting ©; 
not supporting © 
Figshare Commercial software — Multidisciplinary Supporting ©; not supporting ©, ©, or @ 
Nesstar Commercial software Social Sciences Supporting ©, ©, and @; not supporting © 


Note: © Metadata standard and interoperability; @) Permissions management and access control; @) DOI identifier and version 
management; @ Online analysis and visualization. 


After systematic comparisons and assessment, the Dataverse solution was finally chosen as the 
development tool. The Dataverse was originally developed by Harvard's Institute for Quantitative Social 
Science (IQSS), along with many collaborators and contributors worldwide. As of August 7, 2020, it has 
had 59 installations in the world. 


3.3.2 Software Customization 


Although the Dataverse was chosen as the framework for the open research data repository, customization 
was a challenge. The development of the software and version release phases is shown in Figure 1. The 
project milestones are summarized as below: (1) by the end of May 2015, the PKUL completed testing and 
functions building with the Chinese version of Dataverse v3.3 adopted from Fudan University. (2) Starting 
from June 2015, the PKUL continued working on system architecture, localization and customization, and 
functionality and features refinement based on Dataverse 4.0 which then was released and the PKU-ORDR 
was formally launched in December 2015. Version 4.0 was used between June 2015 and July 2019. During 
that time, Harvard University released more than ten minor versions with multiple functions added to the 
framework such as metadata harvesting, private URLs, and cloud storage support. (3) The PKUL decided to 
upgrade the platform to v4.14 by adopting changes made by Harvard University and fixing numerous bugs 
in the early v4.0 in July 2019. The PKUL also completed the function customization and data migration of 
the platform by the upgrade. 
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Dataverse 3.3 | Dataverse 4.0 Dataverse 4.14 


Figure 1. Version development of PKU data repository. 


Here are the highlights of our local customization: (1) user management, (2) bilingual interface, (3) usage 
statistics, (4) data contests, (5) other functions such as DataCite DOI registration, data set related publications, 
and (6) custom home page. 


To enhance user management function, the PKUL implemented the PKU-IAAA single sign-on system in 
the platform to enable our users to quickly and securely authenticate their permissions and instant access 
to the OD repository, and the relevant patron information can also be carried into the data repository 
through the PKU-IAAA. Furthermore, the PKUL enabled group download function so users can download 
multiple files within one data set with one request while original Dataverse only allows users to download 
one file with one request. Also, the PKUL created two types of user account: regular user account and 
advanced user account. A regular user account can be upgraded to an advanced user account when a user 
submits his/her application and provides more required information to get more privileges to become an 
advanced user. A bilingual interface is essential for our users since our repository is open to anyone in the 
world. Original Dataverse provides only unilingual descriptions. Researchers always publish their research 
outputs in English to increase the visibility of their research. From this perspective, the English language is 
an ideal candidate for the user interface. However, the majority of our users come from China, and the 
Chinese language is their mother language and more comfortable for them to use. So the PKUL decided 
to make the Dataverse repository interface and metadata support both English and Chinese. The user 
interface can be switched between Chinese and English and search results can be displayed both in English 
and in Chinese. The notifications sent to users are also be customized in a bilingual format. 


Regarding usage statistics, the original Dataverse system only tracks the number of downloads, which is 
far from satisfying the needs of the PKU data providers’ statistical requirements. Therefore, the PKUL 
enabled log records of the user application, administrator verification, user browsing, and download, etc. 
ElasticSearch is used to index the logs so that the data provider can query and download real-time data. 
Meanwhile, Baidu Analytics was implemented in the data repository pages to analyze data such as user 
sources, devices used, keywords for search, and pages visited. Furthermore, the PKUL hosted two national 
data contests to promote open research data repositories. The contest module was added to the Dataverse 
repository to facilitate user enrollment and data use. Participants were allowed to form teams to enroll 
in the contest, submit their papers, and access research data directly by using their user accounts of the 
data repository. The contest module also provided functions such as the contest homepage and paper 
display gallery. 


Additionally, the PKUL added many other functions to Dataverse. Dataverse 4.0 only provided Handle 
identifier registration, and the PKU-ORDR adopted DataCite DOI to register data. Our module was later 
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adopted by Harvard University and other institutions that are using Dataverse. Since some data sets within 
the PKU-ORDR are of high quality, for example, the China Family Panel Studies data set has been cited by 
numerous research papers, the PKUL also used API to interoperate with the PKU-IR to retrieve papers from 
the PKU-IR and display those papers associated with the data sets on the PKU-ORDR platform. Also, 
Dataverse 4.0 did not support homepage customization but the PKUL developed a custom homepage 
and now such a homepage customization technology has been adopted by Harvard University. As shown 
in Figure 2, numerous efforts had been made between 2014 and 2019, and key milestones are highlighted 
in the diagram. 


The Second 
RDM Interview Data National 
Service Teacher for Resource University 
Case Current Navigation Data Driven 
Survey Status of RDM for Research 
Data Information Discipline DCI index prec in 
ina 
Literacy zeren DAA Ser a set 
Related 
Lecture Platform , 
Publications 
The First | Open 
Open National Research 
Saart Rosaróh open University Data 
PKU Cooperation Data Platform Data h Research Data Driven Platform 
User RDM, with Officially va + Data Research D Upgrade 
Needs Software ISSS Launched > spines Homepage Contest in 
Survey Survey & In PKU Case Survey China 


Test 


2019 


Figure 2. Research Data Management (RDM) service framework development diagram. 


3.4 Usage, Data Curation, Skills Training, and Services Promotion 
3.4.1 Usage 


The PKU-ORDR enhanced the PKUUL's infrastructure for data storage and sharing. Through collaborating 
with academic departments on campus, the PKU-ORDR has collected numerous high-quality data sets, 
examples include China Family Panel Studies, China Health and Retirement Longitudinal Study, and Beijing 
Area Study, Comprehensive Language Knowledge Base, and AutismKB, an Evidence-based Knowledge Base 
of Autism. 


As of August of 2020, the PKU-ORDR has released 66 Dataverses, 305 data sets, and 2,036 data files. 
The total number of downloads has exceeded more than 620,000, The average number of daily visitors is 
about 500, and the average number of page views is 2,700. In recent years, there have been numerous 
visitors from more than 89 countries who visited the repository, and the top five countries are China, the 
United States, the United Kingdom, Japan, and South Korea. The number of registered users has reached 
32,000. Figures 3 and 4 show respectively the top 10 institutions in terms of the registered users in China 
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and abroad. As shown in the figures, all these registered users came from prestigious research universities 
either in China or in the other part of the world. 
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Figure 3. Top 10 domestic institutions in terms of registered users. 
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Figure 4. Top 10 overseas institutions in terms of registered users. 


3.4.2 Data Curation and Skills Training 


To cultivate past research data for future consumption, the PKUL offered several data curation services. 
Collaborating with ISSS, the PKUL hosted an RDM Seminar in 2015. The PKUL invited two experts, one 
from the Inter-University Consortium for Political and Social Research (ICPSR), USA, and one from Data 
Archive, UK, respectively to deliver data management training. The trainees were teachers and students 
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from PKU, as well as from other peer universities in China. The PKUL also sent librarians to participate in 
relevant training activities hosted by other universities to improve their RDM curation skills. To improve 
students’ data search skills, the PKUL offered one-hour workshops to teach students to identify data sources 
and use scientific methods to acquire relevant research data, statistical data, and Internet data. The PKUL 
also provided a series of lectures to teach students how to use data analytics tools. 


3.4.3 Service Promotion 


To improve the visibility of data service provided by the PKUL, the PKUL promoted the use of the PKU- 
ORDR through various channels, including marketing the PKU-ORDR on the PKU homepage, social 
media’s public account of student groups, annual conferences of ISSS, and various RDM related domestic 
and international conferences. To improve data accessibility, the PKU-ORDR provided metadata to the 
re3data.org which is an international data repository registration, DataCite Search, and Data Citation Index 
which are data discovering systems, and search engines such as Baidu and Google. Additionally, collaborating 
with other units on campus and the National Information Center, and the PKUL hosted two national contests 
entitled “National Data-Driven Research Contests for Colleges and Universities” successively in the year 
2018 and 2019 to promote the PKU-ORDR use and RDM services and train students’ data searching and 
acquiring skills. 


The contest included six stages: training workshops and lectures, enrollment, paper submission, paper 
evaluation, and oral defense. During the first stage, the organizers provided training on contest rules, data 
analysis and mining, data management and sharing, data resource, and acquisition. During the enrollment 
stage, the contestants registered in groups, submitted their selected topics, and applied for the research data 
in the PKU-ORDR. During the research paper submission stage, the contestants conducted research using 
the data from the PKU-ORDR or collected original data on their own, wrote essays, and submitted their 
papers together with the data to the organizers. During the paper evaluation stage, the organizers first 
conducted formal assessment and plagiarism checks for the essays, and the qualified papers then were 
evaluated by the experts invited by the organizers. Each essay was reviewed and graded by two experts. 
The papers were ranked accordingly by grades. During the stage of the on-spot oral defense, several top- 
ranked teams delivered their on-spot statements and reports and then were evaluated by more than 10 
experts to decide the final ranking. After the contest, the winning teams shared their research data and 
reported at the Jing Ling Big Data Summits, and excellent essays were published in Chinese core journals 
in a special topic issue. 


The contests attracted numerous students from many other major research universities in the country to 
participate. The first contest recorded an enrollment of nearly 600 teams including about 2,000 contestants 
from more than 160 universities and colleges. They came from 28 provinces, majoring in 59 disciplines 
such as Computer Science, Information Technology, Management Science & Engineering, Applied 
Economics, Statistics, Public Health & Preventive Medicine, Library and Information Science, and Chinese 
History. In the end, 289 teams including about 1,000 participants submitted their research papers. The 
second contest had an enrollment of 600 teams including 1,704 participants from 29 provinces, their 
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disciplines covering Applied Economics, Computer Science, Information Technology, Statistics, Sociology, 
Library and Information Science, Management Science & Engineering, Public Health & Preventive Medicine. 


Through the contests, students from different disciplines obtained experiences from the same OD 
platform. The competition also greatly promoted the data-driven research paradigms and the visibility of 
the PKU-ORDR. Between December 2017 and May 2018 during the two contests, online visitors to the 
PKU-ORDR increased 10 times than before, registered users increased 5 times than before, and data 
downloads increased 7 times than before, respectively. More and more external websites were linked to 
PKU-ORDR. The ranking and exposure of the data in the PKU-ORDR are greatly improved in search 
engines. At the same time, the original data submitted by the contestants greatly enriched the content of 
the data repository. 


4. CONCLUSIONS 


This paper reviewed the implementation process of the PKU-ORDR and the creation of the RDM services 
provided by the PKUL. Through the review, the authors of the paper found that needs assessment and 
collaboration is vital to the success of a library-based university-wide RDM project. Raising the researchers’ 
awareness to OS and OD is critical. Software identification and selection is a complicated and time- 
consuming process. The software must meet some essential criteria such as stability and sustainability. 
Communication is critical in the whole process, particularly with administrative units and other academic 
units on campus. Some data curation programs such as workshops, lectures, and contests can be developed 
to improve researchers’, students’, and librarians’ data searching and acquiring skills and promote services 
on campus and to larger research communities. RDM policies must be created and put in place. In a word, 
this is a learning curve and a cumulative process in theories and practices. The authors of the paper will 
feel rewarded if this practical paper can offer some insights to those academic libraries planning to 
implement their OD repository and/or RDM services for their home institutions. Although the PKUL has 
made great efforts in the RDM construction and contributed to the OS and OD communities on campus 
and even in China, the PKUL feels that it still has a long way to go. There are so many challenges and 
opportunities ahead of libraries and librarians. 
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